Exact mapping of prokaryotic gene starts
نویسندگان
چکیده
It is known that while the programs used to find genes in prokaryotic genomes reliably map protein-coding regions, they often fail in the exact determination of gene starts. This problem is further aggravated by sequencing errors, most notably insertions and deletions leading to frame-shifts. Therefore, the exact mapping of gene starts and identification of frame-shifts are important problems of the computer-assisted functional analysis of newly sequenced genomes. Here we review methods of gene recognition and describe a new algorithm for correction of gene starts and identification of frame-shifts in prokaryotic genomes. The algorithm is based on the comparison of nucleotide and protein sequences of homologous genes from related organisms, using the assumption that the rate of evolutionary changes in protein-coding regions is lower than that in non-coding regions. A dynamic programming algorithm is used to align protein sequences obtained by formal translation of genomic nucleotide sequences. The possibility of frame-shifts is taken into account. The algorithm was tested on several groups of related organisms: gamma-proteobacteria, the Bacillus/Clostridium group, and three Pyrococcus genomes. The testing demonstrated that, dependent or a genome, 1-10 per cent of genes have incorrect starts or contain frame-shifts. The algorithm is implemented in the program package Orthologator-GeneCorrector.
منابع مشابه
Starts of bacterial genes: estimating the reliability of computer predictions.
Exact mapping of gene starts is an important problem in the computer-assisted functional analysis of newly sequenced prokaryotic genomes. We describe an algorithm for finding ribosomal binding sites without a learning sample. This algorithm is particularly useful for analysis of genomes with little or no experimentally mapped genes. There is a clear correlation between the ribosomal binding sit...
متن کاملGeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions.
Improving the accuracy of prediction of gene starts is one of a few remaining open problems in computer prediction of prokaryotic genes. Its difficulty is caused by the absence of relatively strong sequence patterns identifying true translation initiation sites. In the current paper we show that the accuracy of gene start prediction can be improved by combining models of protein-coding and non-...
متن کاملGeneMarkS-2: Raising Standards of Accuracy in Gene Recognition
Motivation: Ab initio gene prediction in prokaryotic genomes is supposed to be so accurate that RNASeq data are rarely produced to bring in an additional layer of evidence. In 2016 more than 60,000 prokaryotic genomes were re-annotated by the NCBI pipeline. Given the sheer volume of prokaryotic DNA data flowing from next generation sequencing facilities into genome databases, the annotation acc...
متن کاملProkaryotic expression, purification and immunogenicity analysis of CpsD protein from Streptococcus iniae
Streptococcus iniae is a major cause of serious bacterial infections in both fish and human beings. Capsular polysaccharide (CPS) of S. iniae is vital to evade phagocytic clearance of the host and serves as an important protective antigen of S. iniae infection in aquatic animals. The CpsD gene was determined to be highly conservative in capsule polysaccharide operon. Prokaryotic expression of t...
متن کاملChallenging for Expression Bovine Rotavirus (RF Strain) Full-Length VP7 Protein in Prokaryotic System
Background and Aims: Rotavirus enteritis is an acute viral infectious disease among infants. VP7 protein has a key role in attachment and entry virus into the target cell. The VP7 protein is involved in inducing the production of neutralizing antibodies that protect infants against reinfection of the virus. The aim of this study was to heterologous expression of the VP7 gene of bovine rotavirus...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Briefings in bioinformatics
دوره 3 2 شماره
صفحات -
تاریخ انتشار 2002